Cepstral compensation by polynomial approximation for environment-independent speech recognition
نویسندگان
چکیده
Speech recognition systems perform poorly on speech degraded by even simple effects such as linear filtering and additive noise. One possible solution to this problem is to modify the probability density function (PDF) of clean speech to account for the effects of the degradation. However, even for the case of linear filtering and additive noise, it is extremely difficult to do this analytically. Previously attempted analytical solutions to the problem of noisy speech recognition have either used an overly-simplified mathematical description of the effects of noise on the statistics of speech, or they have relied on the availability of large environmentspecific adaptation sets. Some of the previous methods required the use of adaptation data that consists of simultaneously-recorded or “stereo” recordings of clean and degraded speech. In this paper we introduce an approximation-based method to compute the effects of the environment on the parameters of the PDF of clean speech. In this work, we perform compensation by Vector Polynomial approximationS (VPS) for the effects of linear filtering and additive noise on the clean speech. We also estimate the parameters of the environment, namely the noise and the channel, by using piecewiselinear approximations of these effects. We evaluate the performance of this method (VPS) using the CMU SPHINX-II system and the 100-word alphanumeric CENSUS database. Performance is evaluated at several SNRs, with artificial white Gaussian noise added to the database. VPS provides improvements of up to 15 percent in relative recognition accuracy.
منابع مشابه
Computationally Efficient Cepstral Domain Feature Compensation
In this letter, we propose a novel approach to feature compensation performed in the cepstral domain. Processing in the cepstral domain has the advantage that the spectral correlation among different frequencies is taken into consideration. By introducing a linear approximation with diagonal covariance assumption, we modify the conventional log-spectral domain feature compensation technique to ...
متن کاملSignal Processing for Robust Speech Recognition
This paper describes several new cepstral-based compensation procedures that render the SPHINX-II system more robust with respect to acoustical environment. The first algorithm, phonedependent cepstral compensation, is similar in concept to the previously-described MFCDCN method, except that cepstral compensation vectors are selected according to the current phonetic hypothesis, rather than on ...
متن کاملAutomatic Speech Recognition in GSM Network Using the Bit-Stream and Auxiliary parameters
The Global System for Mobile (GSM) environment includes three main problems for Automatic Speech Recognition (ASR) systems: noisy scenarios, source coding distortion and transmission errors.The second, source coding distortion must be explicitly addressed.The front-end of the speech recognition system combines feature extracted by converting the quantized spectral information of speech coder, p...
متن کاملEnvironment normalization for robust speech recognition using direct cepstral comparison
In this paper we describe and evaluate a series of new algorithms that compensate for the effects of unknown acoustical environments (or changes in environment) through the use of compensation vectors that are added to the cepstral representations of speech that is input to a speech recognition system. These compensation vectors are obtained from direct frame-by-frame comparisons of the cepstra...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کامل